Using phonetic patterns for detecting social cues in natural conversations

نویسندگان

  • Johannes Wagner
  • Florian Lingenfelser
  • Elisabeth André
چکیده

Laughter and fillers like “uhm” and “ah” are social cues expressed in human speech. Detection and interpretation of such non-linguistic events can reveal important information about the speakers’ intensions and emotional state. The INTERSPEECH 2013 Social Signals Sub-Challenge sets the task to localize and classify laughter and fillers in the “SSPNet Vocalization Corpus” (SVC) based on acoustics. In the paper at hand we investigate phonetic patterns extracted from raw speech transcriptions obtained with the CMU Sphinx toolkit for speech recognition. Even though Sphinx was used out of the box and no dedicated training on the target classes was applied, we were able to successfully predict laughter and filler frames in the development set with ∼ 87% accuracy (unweighted average Area Under the Curve (AUC)). By accumulating our features with a set of standard features provided by the challenge organizers results increased above 92%. When applying the combined set to the test corpus we achieved 87.7% as highest score, which is 4.4% above the challenge baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Production of English Lexical Stress by Persian EFL Learners

This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...

متن کامل

Detecting Egregious Conversations between Customers and Virtual Agents

Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this...

متن کامل

Mandarin Conversation: Turn-taking Cues in Exchange Structure

This study addresses turn taking in everyday Mandarin conversation from a phonetic perspective, in particular, suprasegmental analysis. The acoustic data are based on 103 exchanges from two conversations – one structured and one free. The results show that turn final cues are mostly signaled via various falling intonation patterns in Mandarin conversation. Some anomalies exist and these are ana...

متن کامل

Detecting Overlapping Communities in Social Networks using Deep Learning

In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...

متن کامل

A Phonetic Investigation of Turn-taking Cues at Multiple Unit-levels in Japanese Conversation

In this paper, we investigate acoustic, prosodic, and syntactic cues at multiple unit-levels for turntaking in Japanese conversation, proposing an incremental and hierarchical model of turnprojection, which is applicable to both overlapping and non-overlapping speech. Based on a quantitative analysis of Japanese three-party conversations, we identify several turn-taking cues that are located ea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013